Overview

Dataset statistics

Number of variables14
Number of observations2173144
Missing cells0
Missing cells (%)0.0%
Duplicate rows136288
Duplicate rows (%)6.3%
Total size in memory232.1 MiB
Average record size in memory112.0 B

Variable types

NUM8
CAT3
BOOL3

Warnings

Dataset has 136288 (6.3%) duplicate rows Duplicates
PJ_IDADE has 36122 (1.7%) zeros Zeros

Reproduction

Analysis started2020-09-27 18:38:36.221981
Analysis finished2020-09-27 18:45:16.405552
Duration6 minutes and 40.18 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

CPF
Real number (ℝ≥0)

Distinct1134103
Distinct (%)52.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.341521117e+10
Minimum1163
Maximum9.999999417e+10
Zeros0
Zeros (%)0.0%
Memory size16.6 MiB
2020-09-27T15:45:18.273610image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1163
5-th percentile899480179
Q14723516301
median1.495771024e+10
Q36.37693105e+10
95-th percentile9.236942645e+10
Maximum9.999999417e+10
Range9.999999301e+10
Interquartile range (IQR)5.90457942e+10

Descriptive statistics

Standard deviation3.275267265e+10
Coefficient of variation (CV)0.9801725474
Kurtosis-1.162449459
Mean3.341521117e+10
Median Absolute Deviation (MAD)1.388050776e+10
Skewness0.6192915508
Sum7.261606566e+16
Variance1.072737566e+21
MonotocityNot monotonic
2020-09-27T15:45:18.783728image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
596634307249< 0.1%
 
8.593706135e+10171< 0.1%
 
3077961326108< 0.1%
 
5.849690417e+1096< 0.1%
 
1.816014681e+1055< 0.1%
 
2.804770265e+1054< 0.1%
 
9.959913929e+1050< 0.1%
 
808628364048< 0.1%
 
1.372061577e+1048< 0.1%
 
7.504062073e+1043< 0.1%
 
993980473343< 0.1%
 
640047238041< 0.1%
 
699608570541< 0.1%
 
7.483854723e+1040< 0.1%
 
811043444439< 0.1%
 
541124277038< 0.1%
 
7.654310227e+1038< 0.1%
 
1.80463648e+1037< 0.1%
 
7.927082477e+1037< 0.1%
 
4.263765177e+1037< 0.1%
 
104718870836< 0.1%
 
1.057116476e+1035< 0.1%
 
5.359502569e+1035< 0.1%
 
98943731035< 0.1%
 
8.626575076e+1035< 0.1%
 
Other values (1134078)217165599.9%
 
ValueCountFrequency (%) 
11631< 0.1%
 
19101< 0.1%
 
51501< 0.1%
 
412037< 0.1%
 
441301< 0.1%
 
805271< 0.1%
 
836231< 0.1%
 
856772< 0.1%
 
886921< 0.1%
 
1002262< 0.1%
 
ValueCountFrequency (%) 
9.999999417e+102< 0.1%
 
9.999989713e+101< 0.1%
 
9.999972277e+103< 0.1%
 
9.999932312e+101< 0.1%
 
9.99992851e+102< 0.1%
 
9.999897127e+102< 0.1%
 
9.999891217e+101< 0.1%
 
9.999880712e+103< 0.1%
 
9.999863737e+101< 0.1%
 
9.999862927e+101< 0.1%
 

CNPJ
Real number (ℝ≥0)

Distinct1067950
Distinct (%)49.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.155042501e+13
Minimum455000107
Maximum9.7711797e+13
Zeros0
Zeros (%)0.0%
Memory size16.6 MiB
2020-09-27T15:45:20.679209image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum455000107
5-th percentile4.25222e+12
Q11.36643455e+13
median2.1801166e+13
Q32.90993245e+13
95-th percentile3.5063644e+13
Maximum9.7711797e+13
Range9.7711342e+13
Interquartile range (IQR)1.5434979e+13

Descriptive statistics

Standard deviation1.112461518e+13
Coefficient of variation (CV)0.5162132616
Kurtosis7.243497182
Mean2.155042501e+13
Median Absolute Deviation (MAD)7.72612e+12
Skewness1.344338816
Sum4.68321768e+19
Variance1.237570629e+26
MonotocityNot monotonic
2020-09-27T15:45:21.055358image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1.379503e+12106< 0.1%
 
1.0609938e+13103< 0.1%
 
9.24845e+12100< 0.1%
 
7.715251e+1292< 0.1%
 
7.774211e+1291< 0.1%
 
5.021025e+1291< 0.1%
 
2.8016077e+1384< 0.1%
 
9.116860002e+1183< 0.1%
 
1.5334477e+1383< 0.1%
 
2.276269e+1382< 0.1%
 
5.586590002e+1180< 0.1%
 
1.0014536e+1377< 0.1%
 
5.605572e+1277< 0.1%
 
1.3265725e+1376< 0.1%
 
1.6808908e+1374< 0.1%
 
1.0627791e+1373< 0.1%
 
3.265392e+1269< 0.1%
 
1.8233963e+1368< 0.1%
 
1.2076338e+1368< 0.1%
 
1.4092821e+1366< 0.1%
 
2.9055907e+1366< 0.1%
 
3.4882134e+1366< 0.1%
 
1.349609e+1265< 0.1%
 
1.5427788e+1364< 0.1%
 
1.342356e+1363< 0.1%
 
Other values (1067925)217117799.9%
 
ValueCountFrequency (%) 
4550001071< 0.1%
 
31290001531< 0.1%
 
32510001201< 0.1%
 
35740001135< 0.1%
 
50580001282< 0.1%
 
64860001751< 0.1%
 
68170001771< 0.1%
 
81510001961< 0.1%
 
92900001341< 0.1%
 
1.04780001e+101< 0.1%
 
ValueCountFrequency (%) 
9.7711797e+132< 0.1%
 
9.7554556e+131< 0.1%
 
9.7554536e+132< 0.1%
 
9.7554451e+131< 0.1%
 
9.7554433e+131< 0.1%
 
9.7554425e+131< 0.1%
 
9.7554233e+132< 0.1%
 
9.7554202e+137< 0.1%
 
9.7554128e+132< 0.1%
 
9.7554083e+131< 0.1%
 

PF_GENERO
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.6 MiB
1
1089970 
0
1083174 
ValueCountFrequency (%) 
1108997050.2%
 
0108317449.8%
 
2020-09-27T15:45:21.313914image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

PF_IDADE
Real number (ℝ≥0)

Distinct109
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.1654345
Minimum1
Maximum121
Zeros0
Zeros (%)0.0%
Memory size16.6 MiB
2020-09-27T15:45:21.558006image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile25
Q133
median41
Q350
95-th percentile62
Maximum121
Range120
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.54433315
Coefficient of variation (CV)0.2737866521
Kurtosis-0.2961243606
Mean42.1654345
Median Absolute Deviation (MAD)8
Skewness0.3683152041
Sum91631561
Variance133.2716278
MonotocityNot monotonic
2020-09-27T15:45:22.031644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
38756173.5%
 
39745943.4%
 
37710473.3%
 
40702913.2%
 
35699583.2%
 
41693343.2%
 
34682773.1%
 
36678033.1%
 
33665793.1%
 
42660683.0%
 
43645183.0%
 
32643913.0%
 
44620322.9%
 
31591332.7%
 
45585812.7%
 
46573012.6%
 
47542352.5%
 
48541142.5%
 
30527752.4%
 
49525112.4%
 
50514042.4%
 
29504962.3%
 
51479382.2%
 
52472002.2%
 
28457422.1%
 
Other values (84)65120530.0%
 
ValueCountFrequency (%) 
130< 0.1%
 
260< 0.1%
 
378< 0.1%
 
437< 0.1%
 
527< 0.1%
 
649< 0.1%
 
725< 0.1%
 
839< 0.1%
 
956< 0.1%
 
1044< 0.1%
 
ValueCountFrequency (%) 
1217< 0.1%
 
1204< 0.1%
 
1162< 0.1%
 
1105< 0.1%
 
1099< 0.1%
 
1044< 0.1%
 
1031< 0.1%
 
1022< 0.1%
 
1018< 0.1%
 
1009< 0.1%
 

PJ_PORTE
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.6 MiB
1
1315634 
2
652653 
3
204857 
ValueCountFrequency (%) 
1131563460.5%
 
265265330.0%
 
32048579.4%
 
2020-09-27T15:45:22.673605image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-27T15:45:22.858316image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:45:23.097779image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
1131563460.5%
 
265265330.0%
 
32048579.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number2173144100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1131563460.5%
 
265265330.0%
 
32048579.4%
 

Most occurring scripts

ValueCountFrequency (%) 
Common2173144100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1131563460.5%
 
265265330.0%
 
32048579.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2173144100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
1131563460.5%
 
265265330.0%
 
32048579.4%
 

PJ_SETOR
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.6 MiB
1
940949 
2
873946 
3
350984 
4
 
7265
ValueCountFrequency (%) 
194094943.3%
 
287394640.2%
 
335098416.2%
 
472650.3%
 
2020-09-27T15:45:23.400579image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-27T15:45:23.580297image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:45:23.800912image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
194094943.3%
 
287394640.2%
 
335098416.2%
 
472650.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number2173144100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
194094943.3%
 
287394640.2%
 
335098416.2%
 
472650.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common2173144100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
194094943.3%
 
287394640.2%
 
335098416.2%
 
472650.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2173144100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
194094943.3%
 
287394640.2%
 
335098416.2%
 
472650.3%
 

PJ_IDADE
Real number (ℝ≥0)

ZEROS

Distinct71
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.947120393
Minimum0
Maximum89
Zeros36122
Zeros (%)1.7%
Memory size16.6 MiB
2020-09-27T15:45:24.108591image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median6
Q310
95-th percentile25
Maximum89
Range89
Interquartile range (IQR)7

Descriptive statistics

Standard deviation7.628429845
Coefficient of variation (CV)0.9598986133
Kurtosis5.516000662
Mean7.947120393
Median Absolute Deviation (MAD)3
Skewness2.116795204
Sum17270237
Variance58.1929419
MonotocityNot monotonic
2020-09-27T15:45:24.444038image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
225738811.8%
 
31996699.2%
 
11827648.4%
 
51823418.4%
 
41800158.3%
 
61535867.1%
 
71486166.8%
 
81339656.2%
 
101308966.0%
 
91287855.9%
 
11455182.1%
 
12361521.7%
 
0361221.7%
 
13312451.4%
 
14254081.2%
 
15245591.1%
 
16227571.0%
 
18202030.9%
 
17201920.9%
 
19196380.9%
 
20182050.8%
 
21181320.8%
 
23159030.7%
 
22154470.7%
 
24140260.6%
 
Other values (46)1116125.1%
 
ValueCountFrequency (%) 
0361221.7%
 
11827648.4%
 
225738811.8%
 
31996699.2%
 
41800158.3%
 
51823418.4%
 
61535867.1%
 
71486166.8%
 
81339656.2%
 
91287855.9%
 
ValueCountFrequency (%) 
893< 0.1%
 
791< 0.1%
 
722< 0.1%
 
712< 0.1%
 
702< 0.1%
 
692< 0.1%
 
683< 0.1%
 
642< 0.1%
 
624< 0.1%
 
6113< 0.1%
 

PJ_NUM_FUNCIONARIOS
Real number (ℝ≥0)

Distinct101
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.688102583
Minimum0
Maximum100
Zeros554
Zeros (%)< 0.1%
Memory size16.6 MiB
2020-09-27T15:45:24.790610image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile10
Maximum100
Range100
Interquartile range (IQR)1

Descriptive statistics

Standard deviation5.636453059
Coefficient of variation (CV)2.096814718
Kurtosis86.35112829
Mean2.688102583
Median Absolute Deviation (MAD)0
Skewness7.672872698
Sum5841634
Variance31.76960308
MonotocityNot monotonic
2020-09-27T15:45:25.202749image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1156799772.2%
 
21909728.8%
 
3823943.8%
 
5641042.9%
 
4568582.6%
 
10319831.5%
 
6307821.4%
 
8212051.0%
 
7188930.9%
 
20135250.6%
 
9119240.5%
 
12109560.5%
 
1597030.4%
 
1162010.3%
 
1351810.2%
 
1447470.2%
 
3039980.2%
 
1637140.2%
 
1936520.2%
 
1834720.2%
 
1726030.1%
 
2524920.1%
 
2221200.1%
 
2115900.1%
 
4015690.1%
 
Other values (76)205090.9%
 
ValueCountFrequency (%) 
0554< 0.1%
 
1156799772.2%
 
21909728.8%
 
3823943.8%
 
4568582.6%
 
5641042.9%
 
6307821.4%
 
7188930.9%
 
8212051.0%
 
9119240.5%
 
ValueCountFrequency (%) 
100678< 0.1%
 
9984< 0.1%
 
9835< 0.1%
 
9736< 0.1%
 
9627< 0.1%
 
9527< 0.1%
 
945< 0.1%
 
9328< 0.1%
 
9257< 0.1%
 
9112< 0.1%
 

CANAL_ATENDIMENTO
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.689896758
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size16.6 MiB
2020-09-27T15:45:25.534110image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum6
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.189526389
Coefficient of variation (CV)0.7039047701
Kurtosis3.031742746
Mean1.689896758
Median Absolute Deviation (MAD)0
Skewness1.924091031
Sum3672389
Variance1.41497303
MonotocityNot monotonic
2020-09-27T15:45:25.789463image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
1141228965.0%
 
239574318.2%
 
31474306.8%
 
4998314.6%
 
5801063.7%
 
6377451.7%
 
ValueCountFrequency (%) 
1141228965.0%
 
239574318.2%
 
31474306.8%
 
4998314.6%
 
5801063.7%
 
6377451.7%
 
ValueCountFrequency (%) 
6377451.7%
 
5801063.7%
 
4998314.6%
 
31474306.8%
 
239574318.2%
 
1141228965.0%
 

TEMA_ATENDIMENTO
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.783945288
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Memory size16.6 MiB
2020-09-27T15:45:26.050433image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median4
Q35
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.186356353
Coefficient of variation (CV)0.577798088
Kurtosis-0.4163584417
Mean3.783945288
Median Absolute Deviation (MAD)1
Skewness0.3715626512
Sum8223058
Variance4.780154101
MonotocityNot monotonic
2020-09-27T15:45:26.373814image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
462491328.8%
 
155376925.5%
 
546036921.2%
 
21735848.0%
 
71344896.2%
 
9771793.6%
 
8623992.9%
 
3444182.0%
 
6420241.9%
 
ValueCountFrequency (%) 
155376925.5%
 
21735848.0%
 
3444182.0%
 
462491328.8%
 
546036921.2%
 
6420241.9%
 
71344896.2%
 
8623992.9%
 
9771793.6%
 
ValueCountFrequency (%) 
9771793.6%
 
8623992.9%
 
71344896.2%
 
6420241.9%
 
546036921.2%
 
462491328.8%
 
3444182.0%
 
21735848.0%
 
155376925.5%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.6 MiB
0
2094501 
1
 
78643
ValueCountFrequency (%) 
0209450196.4%
 
1786433.6%
 
2020-09-27T15:45:26.619528image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.6 MiB
0
1190005 
2
983139 
ValueCountFrequency (%) 
0119000554.8%
 
298313945.2%
 
2020-09-27T15:45:26.832143image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-27T15:45:27.035109image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:45:27.262726image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
0119000554.8%
 
298313945.2%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number2173144100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0119000554.8%
 
298313945.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Common2173144100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
0119000554.8%
 
298313945.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2173144100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
0119000554.8%
 
298313945.2%
 

INSTRUMENTO_ATENDIMENTO
Real number (ℝ≥0)

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.582195197
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size16.6 MiB
2020-09-27T15:45:27.510883image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile3
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8780217398
Coefficient of variation (CV)0.5549389489
Kurtosis1.34748415
Mean1.582195197
Median Absolute Deviation (MAD)0
Skewness1.405124792
Sum3438338
Variance0.7709221756
MonotocityNot monotonic
2020-09-27T15:45:27.798932image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
1138248063.6%
 
239438118.1%
 
333861915.6%
 
4370811.7%
 
5205830.9%
 
ValueCountFrequency (%) 
1138248063.6%
 
239438118.1%
 
333861915.6%
 
4370811.7%
 
5205830.9%
 
ValueCountFrequency (%) 
5205830.9%
 
4370811.7%
 
333861915.6%
 
239438118.1%
 
1138248063.6%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.6 MiB
0
1935443 
1
237701 
ValueCountFrequency (%) 
0193544389.1%
 
123770110.9%
 
2020-09-27T15:45:28.017122image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Interactions

2020-09-27T15:42:29.121428image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:31.739483image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:34.292614image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:37.738944image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:40.188375image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:42.663047image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:44.980798image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:47.471255image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:50.195629image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:52.746022image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:55.152334image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:42:57.717114image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:00.149655image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:02.576551image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:05.101724image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:07.739249image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:10.167619image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:12.845558image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:15.442693image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:17.223741image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:18.826644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:20.418122image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:22.031543image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:24.174221image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:26.211041image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:28.080347image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:29.733220image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:31.573509image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:33.505882image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:34.989416image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:36.885847image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:39.257169image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:42.951650image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:46.335019image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:49.957287image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:52.536969image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:54.313041image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:56.849236image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:43:58.933325image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:00.607810image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:02.547781image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:04.624709image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:06.490992image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:08.542919image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:10.242400image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:12.345699image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:14.289443image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:15.958216image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:17.825684image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:19.572516image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:21.424199image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:23.064814image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:24.764652image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:26.476125image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:28.747767image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:30.709342image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:33.240978image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:36.021540image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:39.242003image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:41.761315image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:44.616843image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:46.989729image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:49.498556image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:44:51.891819image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-09-27T15:45:28.226392image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-27T15:45:28.928058image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-27T15:45:29.565013image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-27T15:45:30.162122image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-27T15:45:30.820363image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-27T15:44:56.299682image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-27T15:45:02.941402image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

CPFCNPJPF_GENEROPF_IDADEPJ_PORTEPJ_SETORPJ_IDADEPJ_NUM_FUNCIONARIOSCANAL_ATENDIMENTOTEMA_ATENDIMENTOABORDAGEM_ATENDIMENTOCATEGORIA_ATENDIMENTOINSTRUMENTO_ATENDIMENTOMEIO_ATENDIMENTO
02.621927e+091.275835e+1313313101250211
17.664574e+103.391192e+131391211110010
23.122399e+102.724528e+131352231270210
31.926499e+103.353785e+130631411140010
49.267746e+102.838789e+130431232110010
51.078980e+093.592992e+130251201510210
68.455372e+093.346415e+131262316150030
71.051262e+092.749335e+130321231120211
81.162869e+092.312323e+130501151160230
91.277835e+101.071304e+1302522118110210

Last rows

CPFCNPJPF_GENEROPF_IDADEPJ_PORTEPJ_SETORPJ_IDADEPJ_NUM_FUNCIONARIOSCANAL_ATENDIMENTOTEMA_ATENDIMENTOABORDAGEM_ATENDIMENTOCATEGORIA_ATENDIMENTOINSTRUMENTO_ATENDIMENTOMEIO_ATENDIMENTO
21731341.830029e+102.319734e+131601151120210
21731354.280041e+098.248686e+1212731146140230
21731365.132182e+101.536300e+131561181150020
21731379.085473e+103.137770e+130471221140010
21731388.977295e+102.011362e+130531261350020
21731393.917310e+101.076769e+1314931111140210
21731402.957830e+102.462614e+130643342180010
21731415.244178e+092.697920e+131381231140010
21731421.219942e+101.665197e+131223181360020
21731436.743369e+101.175931e+1304223103540030

Duplicate rows

Most frequent

CPFCNPJPF_GENEROPF_IDADEPJ_PORTEPJ_SETORPJ_IDADEPJ_NUM_FUNCIONARIOSCANAL_ATENDIMENTOTEMA_ATENDIMENTOABORDAGEM_ATENDIMENTOCATEGORIA_ATENDIMENTOINSTRUMENTO_ATENDIMENTOMEIO_ATENDIMENTOcount
412478.086284e+091.160811e+1313432102025021012
654252.804770e+103.062270e+13062122125021012
371976.996086e+098.656556e+120442113925021011
599411.804636e+103.552998e+13122121125021011
1053488.291265e+101.352614e+121542124325021011
69661.084221e+099.422790e+121503212125021010
114221.765013e+093.224078e+13049222125021010
148452.377626e+092.671682e+13046124125021010
186852.995610e+092.880202e+13144213225021010
437938.811834e+093.099636e+130402134225021010